A Chinese Few-Shot Text Classification Method Utilizing Improved Prompt Learning and Unlabeled Data

نویسندگان

چکیده

Insufficiently labeled samples and low-generalization performance have become significant natural language processing problems, drawing concern for few-shot text classification (FSTC). Advances in prompt learning significantly improved the of FSTC. However, methods typically require pre-trained model tokens vocabulary list training, while different models token coding structures, making it impractical to build effective Chinese from previous approaches related English. In addition, a majority current do not make use existing unlabeled data, thus often leading unsatisfactory real-world applications. To address above limitations, we propose novel FSTC method called CIPLUD that combines an which are used small amount data. We two modules: Multiple Masks Optimization-based Prompt Learning (MMOPL) module One-Class Support Vector Machine-based Unlabeled Data Leveraging (OCSVM-UDL) module. The former generates prefixes with multiple masks constructs suitable templates labels. It optimizes random combination problem during label prediction joint probability length constraints. latter, by establishing OCSVM trained vector space, selects reasonable pseudo-label data each category large After selecting mixed them annotated obtain brand new training then repeated steps modules as iterative semi-supervised optimization process. experimental results on four benchmark datasets demonstrate our proposed solution outperformed other average accuracy improvement 2.3%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Few-shot Classification by Learning Disentangled Representations

Machine learning has improved state-of-the art performance in numerous domains, by using large amounts of data. In reality, labelled data is often not available for the task of interest. A fundamental problem of artificial intelligence is finding a representation that can generalize to never seen before classes. In this research, the power of generative models is combined with disentangled repr...

متن کامل

Learning Classification with Unlabeled Data

One of the advantages of supervised learning is that the final error metric is available during training. For classifiers, the algorithm can directly reduce the number of misclassifications on the training set. Unfortunately, when modeling human learning or constructing classifiers for autonomous robots, supervisory labels are often not available or too expensive. In this paper we show that we ...

متن کامل

An Improved CHI Feature Selection Method for Chinese Text Classification

We Proposed a kind of feature selection method named ICHI based on improved CHI. Through the classified experiment ,the result showsthat feature extraction effect of CHI method is better than the traditional CHI’s when them is used to select features in SVM and KNN classification, and the ICHI method can enhance theaccuracy in text classification and it is fittedto extract feather.

متن کامل

Learning phrase patterns for text classification using a knowledge graph and unlabeled data

This paper explores a novel method for learning phrase pattern features for text classification, employing a mapping of selected words into a knowledge graph and self-training over unlabeled data. Using Support Vector Machine classification, we obtain improvements over lexical and fully-supervised phrase pattern features in domain and intent detection for language understanding, particularly in...

متن کامل

Few-shot Learning

Though deep neural networks have shown great success in the large data domain, they generally perform poorly on few-shot learning tasks, where a classifier has to quickly generalize after seeing very few examples from each class. The general belief is that gradient-based optimization in high capacity classifiers requires many iterative steps over many examples to perform well. Here, we propose ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13053334